CURVE_FIT

Overview

The CURVE_FIT function performs nonlinear least squares regression to fit a user-defined model function to observed data. This is essential for modeling relationships that cannot be captured by simple linear regression, such as exponential decay, power laws, or Gaussian distributions.

This implementation wraps scipy.optimize.curve_fit from the SciPy library. The function minimizes the sum of squared residuals between the observed data and the model predictions:

\min_{\theta} \sum_{i=1}^{M} \left( y_i - f(x_i, \theta) \right)^2

where f(x, \theta) is the model function, \theta represents the parameters to be optimized, and (x_i, y_i) are the data points.

The function supports three optimization algorithms via the curve_fit_method parameter:

  • Levenberg-Marquardt (lm): The default method for unconstrained problems. It combines gradient descent and Gauss-Newton methods, making it robust for a wide range of problems. Cannot be used with parameter bounds.
  • Trust Region Reflective (trf): Suitable for large-scale problems and supports box constraints on parameters. This method is automatically selected when bounds are provided.
  • Dogleg Box (dogbox): A trust-region method particularly effective when the number of observations is small relative to the number of parameters.

The function returns fitted parameter values along with their standard errors. Standard errors are computed from the covariance matrix as \sigma_i = \sqrt{\text{diag}(C)_i}, where C is the estimated covariance matrix of the parameters. These errors represent the uncertainty in parameter estimates based on a linear approximation around the optimal solution.

The model expression is specified as a Python lambda string (e.g., lambda x, a, b: a * x**b for a power law). The function automatically extracts parameter names from the lambda signature and supports common mathematical functions including trigonometric, exponential, and logarithmic operations.

For more details on the underlying algorithm and advanced options, see the SciPy curve_fit documentation and the related least_squares function.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=CURVE_FIT(model_expr, xdata, ydata, guess, bounds, curve_fit_method)
  • model_expr (str, required): Model expression as a Python lambda function string (e.g. lambda x, a, b for a power law)
  • xdata (list[list], required): Independent variable values as a 2D list (column vector with at least two rows)
  • ydata (list[list], required): Dependent variable values as a 2D list matching xdata length
  • guess (list[list], optional, default: null): Initial parameter values as a single-row 2D list (optional)
  • bounds (list[list], optional, default: null): Parameter bounds as lower and upper values (optional)
  • curve_fit_method (str, optional, default: null): Solver algorithm (lm, trf, or dogbox) (optional)

Returns (list[list] or str): 2D list where the first row contains parameter names (list[str]), the second row contains the fitted values (list[float]), and the third row contains standard errors (list[float], zeros if unavailable). Returns an error message string on failure.

Examples

Example 1: Power law fit y = a * x^b

Inputs:

model_expr xdata ydata
lambda x, a, b: a * x**b 1 2
2 5.6568542495
3 10.3923048454
4 16
5 22.360679775

Excel formula:

=CURVE_FIT("lambda x, a, b: a * x**b", {1;2;3;4;5}, {2;5.6568542495;10.3923048454;16;22.360679775})

Expected output:

a b
2 1.5
0 0

Example 2: Exponential association with TRF method

Inputs:

model_expr xdata ydata curve_fit_method
lambda x, A, B: A * (1 - exp(-B * x)) 0 0 trf
1 2.5918177932
2 4.5118836391
3 5.9343034026
4 6.9880578809
5 7.7686983985

Excel formula:

=CURVE_FIT("lambda x, A, B: A * (1 - exp(-B * x))", {0;1;2;3;4;5}, {0;2.5918177932;4.5118836391;5.9343034026;6.9880578809;7.7686983985}, "trf")

Expected output:

A B
10 0.3
0 0

Example 3: Gaussian peak with initial guess

Inputs:

model_expr xdata ydata guess
lambda x, A, x0, sigma: A * exp(-((x - x0)**2) / (2*sigma**2)) -3 0.0555449827 4.5 0.2 0.9
-2 0.6766764162
-1 3.0326532986
0 5
1 3.0326532986
2 0.6766764162
3 0.0555449827

Excel formula:

=CURVE_FIT("lambda x, A, x0, sigma: A * exp(-((x - x0)**2) / (2*sigma**2))", {-3;-2;-1;0;1;2;3}, {0.0555449827;0.6766764162;3.0326532986;5;3.0326532986;0.6766764162;0.0555449827}, {4.5,0.2,0.9})

Expected output:

A x0 sigma
5 0 1
0 0 0

Example 4: Sine wave fit with bounds using dogbox

Inputs:

model_expr xdata ydata guess bounds curve_fit_method
lambda x, A, omega, phi, y0: A * sin(omega*x + phi) + y0 0 0 1.5 1 0 0 0 0.5 -0.5 -0.5 dogbox
1 1.7320508076 3 2.5 0.5 0.5
2 1.7320508076
3 0
4 -1.7320508076
5 -1.7320508076
6 0

Excel formula:

=CURVE_FIT("lambda x, A, omega, phi, y0: A * sin(omega*x + phi) + y0", {0;1;2;3;4;5;6}, {0;1.7320508076;1.7320508076;0;-1.7320508076;-1.7320508076;0}, {1.5,1,0,0}, {0,0.5,-0.5,-0.5;3,2.5,0.5,0.5}, "dogbox")

Expected output:

A omega phi y0
2 1.0472 0 0
0 0 0 0

Python Code

from scipy.optimize import curve_fit as scipy_curve_fit
from scipy import special as sc
import math
import inspect
import numpy as np
import re

def curve_fit(model_expr, xdata, ydata, guess=None, bounds=None, curve_fit_method=None):
    """
    Fit a model expression to xdata, ydata using scipy.optimize.curve_fit.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        model_expr (str): Model expression as a Python lambda function string (e.g. lambda x, a, b for a power law)
        xdata (list[list]): Independent variable values as a 2D list (column vector with at least two rows)
        ydata (list[list]): Dependent variable values as a 2D list matching xdata length
        guess (list[list], optional): Initial parameter values as a single-row 2D list (optional) Default is None.
        bounds (list[list], optional): Parameter bounds as lower and upper values (optional) Default is None.
        curve_fit_method (str, optional): Solver algorithm (lm, trf, or dogbox) (optional) Valid options: Levenberg-Marquardt (LM), Trust Region Reflective (TRF), Dogleg Box (DOGBOX). Default is None.

    Returns:
        list[list] or str: 2D list where the first row contains parameter names (list[str]), the second row contains the fitted values (list[float]), and the third row contains standard errors (list[float], zeros if unavailable). Returns an error message string on failure.
    """
    def _ensure_2d_list(arg):
        """
        Validate that arg is a 2D list with at least two rows and return a 1-D numpy
        array (float64) of the first column values.
        """
        if not isinstance(arg, list) or len(arg) < 2:
            raise ValueError("must be a 2D list with at least two rows")
        vals = []
        for i, row in enumerate(arg):
            if not isinstance(row, list) or len(row) == 0:
                raise ValueError(f"row {i} must be a non-empty list")
            try:
                vals.append(float(row[0]))
            except Exception:
                raise ValueError(f"row {i} contains non-numeric value")
        return np.asarray(vals, dtype=np.float64)

    if not isinstance(model_expr, str):
        return "Invalid input: model_expr must be a string."

    if model_expr.strip() == "":
        return "Invalid input: model_expr must be a non-empty string."

    if "x" not in model_expr:
        return "Invalid input: model_expr must reference variable x (e.g., 'lambda x, a, b: a * x**b')."

    # Auto-convert caret (^) to double asterisk (**) for exponentiation ONCE
    model_expr = re.sub(r'\^', '**', model_expr)

    # Build safe globals dictionary following func_arg_spec.md pattern
    safe_globals = {
        "math": math,
        "np": np,
        "numpy": np,
        "sc": sc,
        "__builtins__": {},
    }

    # Add all math module functions
    safe_globals.update({
        name: getattr(math, name)
        for name in dir(math)
        if not name.startswith("_")
    })

    # Add common numpy/math function aliases
    safe_globals.update({
        "sin": np.sin,
        "cos": np.cos,
        "tan": np.tan,
        "asin": np.arcsin,
        "arcsin": np.arcsin,
        "acos": np.arccos,
        "arccos": np.arccos,
        "atan": np.arctan,
        "arctan": np.arctan,
        "sinh": np.sinh,
        "cosh": np.cosh,
        "tanh": np.tanh,
        "exp": np.exp,
        "log": np.log,
        "ln": np.log,
        "log10": np.log10,
        "sqrt": np.sqrt,
        "abs": np.abs,
        "pow": np.power,
        "pi": math.pi,
        "e": math.e,
        "inf": math.inf,
        "nan": math.nan,
    })

    # Evaluate the model expression to get the callable
    try:
        model_func = eval(model_expr, safe_globals)
    except Exception as exc:
        return f"Error evaluating model_expr: {exc}"

    if not callable(model_func):
        return "Invalid input: model_expr must evaluate to a callable function."

    # Extract parameter names from the model function signature (skip the first arg, which is 'x')
    try:
        sig = inspect.signature(model_func)
        param_names = [p.name for i, p in enumerate(sig.parameters.values()) if i > 0]
    except Exception:
        return "Internal error: unable to inspect model function signature."
    n_params = len(param_names)

    try:
        x_array = _ensure_2d_list(xdata)
    except ValueError as e:
        return f"Invalid input for xdata: {e}"
    try:
        y_array = _ensure_2d_list(ydata)
    except ValueError as e:
        return f"Invalid input for ydata: {e}"

    if x_array.shape[0] != y_array.shape[0]:
        return "Invalid input: xdata and ydata must have the same number of rows."

    # Helper function to normalize 2D list to handle Excel's conversion of single-element 2D lists
    def normalize_to_2d_list(value):
        """Convert a scalar or 1D list to a 2D list for consistent processing."""
        if not isinstance(value, list):
            return [[value]]
        return value

    # Process initial guess
    p0 = None
    if guess is not None:
        guess = normalize_to_2d_list(guess)
        if not isinstance(guess, list) or len(guess) == 0 or not isinstance(guess[0], list):
            return f"Invalid input: guess must be a 2D list (e.g. [[{','.join(param_names)}]])."
        try:
            p0_raw = guess[0]
        except Exception as e:
            return f"Invalid guess structure: {e}"

        try:
            p0_arr = np.asarray(p0_raw, dtype=np.float64).ravel()
            p0 = tuple(float(x) for x in p0_arr.tolist())
        except Exception as e:
            return f"Invalid guess values: {e}"

        if len(p0) != n_params:
            return f"Invalid input: guess must contain {n_params} values matching the model parameters."
    else:
        # scipy requires p0 to determine number of parameters
        # Use 1.0 as default for all parameters as a reasonable starting point
        p0 = tuple([1.0] * n_params)

    # Process bounds
    bounds_tuple = None
    if bounds is not None:
        if not isinstance(bounds, list) or len(bounds) != 2:
            return f"Invalid input: bounds must be a 2D list [[lower1,...],[upper1,...]]."
        if not isinstance(bounds[0], list) or not isinstance(bounds[1], list):
            return f"Invalid input: bounds must be a 2D list [[lower1,...],[upper1,...]]."

        try:
            raw_lower = bounds[0]
            raw_upper = bounds[1]
            if len(raw_lower) != n_params or len(raw_upper) != n_params:
                return f"Invalid input: bounds must contain {n_params} values for lower and {n_params} for upper."
            lower = []
            upper = []
            for v in raw_lower:
                if v is None:
                    lower.append(-np.inf)
                else:
                    lower.append(float(v))
            for v in raw_upper:
                if v is None:
                    upper.append(np.inf)
                else:
                    upper.append(float(v))
        except Exception as e:
            return f"Invalid bounds values: {e}"

        bounds_tuple = (np.asarray(lower, dtype=np.float64), np.asarray(upper, dtype=np.float64))

    # Validate method parameter
    allowed_methods = {"lm", "trf", "dogbox"}
    method_value = None
    if curve_fit_method is not None:
        if isinstance(curve_fit_method, str):
            method_candidate = curve_fit_method.strip().lower()
        else:
            return "Invalid input: curve_fit_method must be one of 'lm', 'trf', or 'dogbox'."
        if method_candidate not in allowed_methods:
            return "Invalid input: curve_fit_method must be one of 'lm', 'trf', or 'dogbox'."
        method_value = method_candidate

    # Validate method vs bounds compatibility
    if method_value == 'lm' and bounds_tuple is not None:
        return "Invalid input: curve_fit_method='lm' cannot be used with bounds; use 'trf' or 'dogbox' instead."

    # With lm, number of observations (M) must be >= number of parameters (N)
    if method_value == 'lm' and x_array.shape[0] < n_params:
        return f"Invalid input: curve_fit_method='lm' requires number of observations >= number of parameters ({x_array.shape[0]} < {n_params})."

    # Wrap the model to ensure outputs are float64
    def _model(x, *params):
        try:
            result = model_func(x, *params)
            return np.asarray(result, dtype=np.float64)
        except Exception as e:
            raise ValueError(f"Model evaluation error: {e}")

    try:
        curve_fit_kwargs = {'maxfev': 10000}
        if p0 is not None:
            curve_fit_kwargs['p0'] = p0
        if bounds_tuple is not None:
            curve_fit_kwargs['bounds'] = bounds_tuple
        if method_value is not None:
            curve_fit_kwargs['method'] = method_value
        popt, pcov = scipy_curve_fit(_model, x_array, y_array, **curve_fit_kwargs)
    except Exception as e:
        return f"Error during curve_fit: {e}"

    try:
        fitted_vals = [float(v) for v in popt]
    except Exception as e:
        return f"Result parsing error: {e}"

    for v in fitted_vals:
        if math.isnan(v) or math.isinf(v):
            return "Fitting produced invalid numeric values (NaN or inf)."

    # Calculate standard errors from covariance matrix
    # If covariance cannot be estimated, return zeros for standard errors
    std_errors = [0.0] * n_params
    try:
        if pcov is not None and np.isfinite(pcov).all():
            std_errors = [float(v) for v in np.sqrt(np.diag(pcov))]
    except Exception:
        # Keep default zeros if calculation fails
        pass

    return [param_names, fitted_vals, std_errors]

Online Calculator